Examples of Calculations in R

David C. Howell

GreenBlueBar.gif GreenBlueBar.gif


The purpose of this section is just to show you more about how R operates. At one level R is just a great electronic calculator. At another level it creates really nice graphics, and at a third level it has thousands of built in function that you can use to do all sorts of things.

Just a calculator

The first thing that we will do is to use R as a dumb calculator. If you enter an equation (e.g. b <- 5/3) the calculator will solve the equation and store the result. If you leave off the left side of the equation, it will evaluate the right side and print it out automatically. For example

> 5/3
[1] 1.666667
# and
> b <- 5/3
b        # same as print(b)
[1] 1.666667
or
(b <- 5/3)
[1] 1.666667

The "greater than" sign in the left margin is simply the prompt showing that this is a command line. The "[1]" tells you that 1.6667 is the first of the outputs. In this case there is only one output.

Now that we have b, we can use it in a calculation. For example, we could create a variable called x and multiply b and x together

> x <- 7
> product <- b*x
> product
[1] 11.66667
> 

Those results are not particularly interesting, so let's move on to something better. As I said elsewhere ( vectors.html ) R works with vectors. That means that b can be a single number or it can be a string of numbers. Suppose that we want to find the mean of a set of numbers. We can create the set using "c" as the concatenation operator--that just means that it strings things together. For example

>
> x <- c(12, 14, 16, 14, 19, 20, 23)
> x    
[1] 12 14 16 14 19 20 23
> sum(x)
[1] 118
> length(x)
[1] 7
> mean(x)
[1] 16.85714
> 

The first line creates the set of x values, called a "vector." The second prints them out. (The [1] on the left says that 12 is the first value (of 7).) The next line says that we want to calculate the sum of the values of x. Here someone has written a function for you and named it the "sum" function. When you type sum(...), it goes and finds that function and does the work. You don't have to think about it. The next two commands compute the length (the number of observations in x) and the mean. We could also get the mean by typing

>
> xbar <- sum(x)/length(x)
print(xbar)
[1]  16.85713
or
(xbar <- sum(x)/length(x))
### Enclosing the command in parentheses causes it to print the result.

You probably aren't quite satisfied by my explanation of the [1] in the margin. Well let's create a large vector of random numbers (perhaps 100 of them) and print them out. We have a random number generater that someone wrote, made into a function, and stuck into R, so creating those numbers is a breeze. I want 100 random numbers drawn from a normal distribution with mean = 35 and standard deviation = 7.


y <- rnorm(100, 35, 7)
> y
  [1] 47.84491 34.26865 39.36875 25.22486 40.84229 37.31904 ... 
 [12] 37.93739 29.37225 29.49127 32.81972 31.38824 26.29619 ... 
 [23] 27.04887 37.99185 32.04899 38.93400 34.26171 29.55976 ... 
 [34] 38.21872 36.91282 28.74664 36.05580 32.58514 19.76187 ... 
 [45] 32.88337 29.75646 39.65179 30.31897 35.72330 30.00774 ... 
 [56] 34.21049 30.60006 42.31394 40.19359 46.54127 32.24113 ... 
 [67] 40.22709 34.05382 36.03204 40.29976 54.91402 43.98330 ... 
 [78] 36.74180 42.81791 30.68213 26.74414 22.53559 37.61425 ... 
 [89] 40.42212 31.61166 46.42465 32.86052 46.58976 34.12389 ... 
[100] 40.05273

Here 47.84491 is the first number, the first entry in the second line (37.93739) is the 12th, the first entry in the third line (27.04887) is the 23rd, and so on. The trick is that the notation x[1] is a subscript to be read as "x sub 1." x[12] is "x sub 12" and so on.

Vectors are Powerful

I am repeating myself here for those who have not read the page on vectors. Above I said that R operates with vectors that can be of any length. "27" is a vector of length 1, "34, 56, 67, 46" is a vector of length 4, etc. A very powerful thing about R is that it doesnt' care how long a vector is. It treats a multi-element vector the same way it treats a simple number. For example we all know that 5/3 is 1.6667. But what if I have a long vector?

  x <- c(3,6,8,12,15)
> x/3
[1] 1.000000 2.000000 2.666667 4.000000 5.000000
> 

Notice that when x is a vector with 5 elements, x/3 is also a vector with 5 elements, each of which is the corresponding value of x divided by 3. Now that may not sound like such a great thing, but it can save an enormous amount of work. Suppose we go back to my 100 random numbers. I want to get the mean of y, which is simple, and I want to get the sum of squares-i.e. the sum of each squared element of y. Below I show two ways of getting the latter.

>

ybar
[1] 34.9916
y2 <- y*y
y2
  [1] 2289.1351 1174.3405 1549.8981  636.2936 1668.0928 1392.7109 
  1514.1595 1322.2663 1573.5924  776.5168
 [11]  etc
sum(y2)
[1] 126599.4
# OR, to do all of that in one step,
sumy2 <- sum(y*y)
sumy2
[1] 126599.4
# or, to use the "^" operator, which raises things to a power, 
sumy2 <- sum(y^2)
> sumy2
[1] 126599.4

The last thing in that example is worth a comment. Notice that I don't have to get a vector of y^2 and then sum it. I can do it all in one statement by lumping things together. As you will see in the text, the standard deviation is the square root of 1) the sum of the squared values of y minus 2) the sum of y squared divided by N, and 3) all that divided by N-1. Here N stands for the number of elements of y. When I compute a variable, it is always around for me to use it again, unless I write over it. So, if I wanted to, I could write


N <- length(y)
sumy <- sum(y)
sumy2 <- sum(y^2)
numerator <- sumy2 - (sumy)^2/N
denominator <- N-1
stdeviation <- sqrt(numerator/denominator)
print(stdeviation)
# But I can also cram all of this onto one line
stdev <- sqrt((sum(y*y)-(sum(y)^2)/length(y))/(length(y)-1))
stdev
[1] 6.480909 
sd(y)
[1] 6.480909

Some people like to write code like the long version. I think that it is too hard to read, so I would prefer to do it in individual steps just so I'm sure what I am really doing. Otherwise it is easy to get parentheses in the wrong place, and once they are wrong, they are very hard to find. Either way works.

In the GreenBlueBar.gif dch: